AITopics | musical note

Collaborating Authors

musical note

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SING: Symbol-to-Instrument Neural Generator

Alexandre Defossez, Neil Zeghidour, Nicolas Usunier, Leon Bottou, Francis Bach

Neural Information Processing SystemsFeb-12-2026, 21:12:35 GMT

Neural Information Processing Systems http://nips.cc/

instrument, spectrogram, waveform, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

Probing Audio-Generation Capabilities of Text-Based Language Models

Anbazhagan, Arjun Prasaath, Kumar, Parteek, Kaur, Ujjwal, Akalin, Aslihan, Zhu, Kevin, O'Brien, Sean

arXiv.org Artificial IntelligenceJun-3-2025

How does textual representation of audio relate to the Large Language Model's (LLMs) learning about the audio world? This research investigates the extent to which LLMs can be prompted to generate audio, despite their primary training in textual data. We employ a three-tier approach, progressively increasing the complexity of audio generation: 1) Musical Notes, 2) Environmental Sounds, and 3) Human Speech. To bridge the gap between text and audio, we leverage code as an intermediary, prompting LLMs to generate code that, when executed, produces the desired audio output. To evaluate the quality and accuracy of the generated audio, we employ FAD and CLAP scores. Our findings reveal that while LLMs can generate basic audio features, their performance deteriorates as the complexity of the audio increases. This suggests that while LLMs possess a latent understanding of the auditory world, their ability to translate this understanding into tangible audio output remains rudimentary. Further research into techniques that can enhance the quality and diversity of LLM-generated audio can lead to an improvement in the performance of text-based LLMs in generating audio.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.00003

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unrolled Creative Adversarial Network For Generating Novel Musical Pieces

Nag, Pratik

arXiv.org Artificial IntelligenceDec-31-2024

Music generation has been established as a prominent topic in artificial intelligence and machine learning over recent years. In most recent works on RNN-based neural network methods have been applied for sequence generation. In contrast, generative adversarial networks (GANs) and their counterparts have been explored by very few researchersfor music generation. In this paper, a classical system was employed alongside a new system to generate creative music. Both systems were designed based on adversarial networks to generate music by learning from examples. The classical system was trained to learn a set of music pieces without differentiating between classes, whereas the new system was trained to learn the different composers and their styles to generate a creative music piece by deviating from the learned composers' styles. The base structure utilized was generative adversarial networks (GANs), which are capable of generating novel outputs given a set of inputs to learn from and mimic their distribution. It has been shown in previous work that GANs are limited in their original design with respect to creative outputs. Building on the Creative Adversarial Networks (CAN) , this work applied them in the music domain rather than the visual art domain. Additionally, unrolled CAN was introduced to prevent mode collapse. Experiments were conducted on both GAN and CAN for generating music, and their capabilities were measured in terms of deviation from the input set.

architecture, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.00452

Country: Oceania > Australia (0.14)

Genre: Research Report (0.50)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

REFFLY: Melody-Constrained Lyrics Editing Model

Zhao, Songyan, Li, Bingxuan, Tian, Yufei, Peng, Nanyun

arXiv.org Artificial IntelligenceAug-30-2024

Automatic melody-to-lyric generation aims to produce lyrics that align with a given melody. Although previous work can generate lyrics based on high-level control signals, such as keywords or genre, they often struggle with three challenges: (1) lack of controllability, as prior works are only able to produce lyrics from scratch, with little or no control over the content; (2) inability to generate fully structured songs with the desired format; and (3) failure to align prominent words in the lyrics with prominent notes in the melody, resulting in poor lyrics-melody alignment. In this work, we introduce REFFLY (REvision Framework For Lyrics), the first revision framework designed to edit arbitrary forms of plain text draft into high-quality, full-fledged song lyrics. Our approach ensures that the generated lyrics retain the original meaning of the draft, align with the melody, and adhere to the desired song structures. We demonstrate that REFFLY performs well in diverse task settings, such as lyrics revision and song translation. Experimental results show that our model outperforms strong baselines, such as Lyra (Tian et al. 2023) and GPT-4, by 25% in both musicality and text quality.

constraint, lyric, prominent note, (16 more...)

arXiv.org Artificial Intelligence

2409.00292

Country:

North America > United States > Texas (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)

Add feedback

GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework

Lv, Ang, Tan, Xu, Lu, Peiling, Ye, Wei, Zhang, Shikun, Bian, Jiang, Yan, Rui

arXiv.org Artificial IntelligenceSep-29-2023

Symbolic music generation aims to create musical notes, which can help users compose music, such as generating target instrument tracks based on provided source tracks. In practical scenarios where there's a predefined ensemble of tracks and various composition needs, an efficient and effective generative model that can generate any target tracks based on the other tracks becomes crucial. However, previous efforts have fallen short in addressing this necessity due to limitations in their music representations and models. In this paper, we introduce a framework known as GETMusic, with ``GET'' standing for ``GEnerate music Tracks.'' This framework encompasses a novel music representation ``GETScore'' and a diffusion model ``GETDiff.'' GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time. At a training step, each track of a music piece is randomly selected as either the target or source. The training involves two processes: In the forward process, target tracks are corrupted by masking their tokens, while source tracks remain as the ground truth; in the denoising process, GETDiff is trained to predict the masked target tokens conditioning on the source tracks. Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations. Our experiments demonstrate that the versatile GETMusic outperforms prior works proposed for certain specific composition tasks.

getmusic, getscore, representation, (15 more...)

arXiv.org Artificial Intelligence

2305.10841

Country:

North America > United States (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

The Music Note Ontology

Poltronieri, Andrea, Gangemi, Aldo

arXiv.org Artificial IntelligenceMar-30-2023

In this paper we propose the Music Note Ontology, an ontology for modelling music notes and their realisation. The ontology addresses the relation between a note represented in a symbolic representation system, and its realisation, i.e. a musical performance. This work therefore aims to solve the modelling and representation issues that arise when analysing the relationships between abstract symbolic features and the corresponding physical features of an audio signal. The ontology is composed of three different Ontology Design Patterns (ODP), which model the structure of the score (Score Part Pattern), the note in the symbolic notation (Music Note Pattern) and its realisation (Musical Object Pattern).

artificial intelligence, information, ontology, (15 more...)

arXiv.org Artificial Intelligence

2304.00986

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.05)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.05)
(3 more...)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

A Non-iterative Spatio-temporal Multi-task Assignments based Collision-free Trajectories for Music Playing Robots

Velhal, Shridhar, VS, Krishna Kishore, Sundaram, Suresh

arXiv.org Artificial IntelligenceFeb-17-2023

In this paper, a non-iterative spatio-temporal multi-task assignment approach is used for playing piano music by a team of robots. This paper considers the piano playing problem, in which an algorithm needs to compute the trajectories for a dynamically sized team of robots who will play the musical notes by traveling through the specific locations associated with musical notes at their respective specific times. A two-step dynamic resource allocation based on a spatio-temporal multi-task assignment problem (DREAM), has been implemented to assign robots for playing the musical tune. The algorithm computes the required number of robots to play the music in the first step. In the second step, optimal assignments are computed for the updated team of robots, which minimizes the total distance traveled by the team. Even for the individual feasible trajectories, the multi-robot execution may fail if robots encounter a collision. As some time will be utilized for this conflict resolution, robots may not be able to reach the desired location on time. This paper analyses and proves that, if robots are operating in a convex region, the solution of the DREAM approach provides collision-free trajectories. The working of the DREAM approach has been illustrated with the help of the high fidelity simulations in Gazebo operated using ROS2. The result clearly shows that the DREAM approach computes the required number of robots and assigns multiple tasks to robots in at most two steps. The simulation of the robots playing music, using computed assignments, is demonstrated in the attached video. video link: \url{https://youtu.be/XToicNm-CO8}

artificial intelligence, optimization problem, robot, (16 more...)

arXiv.org Artificial Intelligence

2210.07653

Country: Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.40)

Industry:

Transportation (1.00)
Media > Music (0.41)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.50)

Add feedback

Enhancing Audio Perception of Music By AI Picked Room Acoustics

Verma, Prateek, Berger, Jonathan

arXiv.org Artificial IntelligenceAug-16-2022

Every sound that we hear is the result of successive convolutional operations (e.g. room acoustics, microphone characteristics, resonant properties of the instrument itself, not to mention characteristics and limitations of the sound reproduction system). In this work we seek to determine the best room in which to perform a particular piece using AI. Additionally, we use room acoustics as a way to enhance the perceptual qualities of a given sound. Historically, rooms (particularly Churches and concert halls) were designed to host and serve specific musical functions. In some cases the architectural acoustical qualities enhanced the music performed there. We try to mimic this, as a first step, by designating room impulse responses that would correlate to producing enhanced sound quality for particular music. A convolutional architecture is first trained to take in an audio sample and mimic the ratings of experts with about 78 % accuracy for various instrument families and notes for perceptual qualities. This gives us a scoring function for any audio sample which can rate the perceptual pleasantness of a note automatically. Now, via a library of about 60,000 synthetic impulse responses mimicking all kinds of room, materials, etc, we use a simple convolution operation, to transform the sound as if it was played in a particular room. The perceptual evaluator is used to rank the musical sounds, and yield the "best room or the concert hall" to play a sound. As a byproduct it can also use room acoustics to turn a poor quality sound into a "good" sound.

architecture, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2208.07994

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Genre: Research Report (0.40)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)
Information Technology > Artificial Intelligence > Speech (0.70)

Add feedback

Music Generation using Three-layered LSTM

Ingale, Vaishali, Mohan, Anush, Adlakha, Divit, Kumar, Krishan, Gupta, Mohit

arXiv.org Artificial IntelligenceJun-9-2021

This paper explores the idea of utilising Long Short-Term Memory neural networks (LSTMNN) for the generation of musical sequences in ABC notation. The proposed approach takes ABC notations from the Nottingham dataset and encodes it to be fed as input for the neural networks. The primary objective is to input the neural networks with an arbitrary note, let the network process and augment a sequence based on the note until a good piece of music is produced. Multiple calibrations have been done to amend the parameters of the network for optimal generation. The output is assessed on the basis of rhythm, harmony, and grammar accuracy.

epoch, notation, sequence, (15 more...)

arXiv.org Artificial Intelligence

2105.09046

Country:

Asia > India > Maharashtra > Pune (0.07)
Asia > Singapore (0.05)

Genre: Research Report (0.50)

Industry:

Media > Music (0.71)
Leisure & Entertainment (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Self-Supervised Learning of Audio Representations from Permutations with Differentiable Ranking

Carr, Andrew N, Berthet, Quentin, Blondel, Mathieu, Teboul, Olivier, Zeghidour, Neil

arXiv.org Artificial IntelligenceMar-17-2021

Self-supervised pre-training using so-called "pretext" tasks has recently shown impressive performance across a wide range of modalities. In this work, we advance self-supervised learning from permutations, by pre-training a model to reorder shuffled parts of the spectrogram of an audio signal, to improve downstream classification performance. We make two main contributions. First, we overcome the main challenges of integrating permutation inversions into an end-to-end training scheme, using recent advances in differentiable ranking. This was heretofore sidestepped by casting the reordering task as classification, fundamentally reducing the space of permutations that can be exploited. Our experiments validate that learning from all possible permutations improves the quality of the pre-trained representations over using a limited, fixed set. Second, we show that inverting permutations is a meaningful pretext task for learning audio representations in an unsupervised fashion. In particular, we improve instrument classification and pitch estimation of musical notes by reordering spectrogram patches in the time-frequency space.

downstream task, permutation, representation, (14 more...)

arXiv.org Artificial Intelligence

2103.09879

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.85)

Add feedback